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Abstract 

Background: Polycomb Repressive Complex 2 (PRC2) is an essential regulator of gene expression that maintains 
genes in a repressed state by marking chromatin with trimethylated Histone H3 lysine 27 (H3K27me3). In 
Arabidopsis, loss of PRC2 function leads to pleiotropic effects on growth and development thought to be due to 
ectopic expression of seed and embryo-specific genes. While there is some understanding of the mechanisms by 
which specific genes are targeted by PRC2 in animal systems, it is still not clear how PRC2 is recruited to specific 
regions of plant genomes. 

Results: We used ChlP-seq to determine the genome-wide distribution of hemagglutinin (HA)-tagged 
FERTLIZATION INDEPENDENT ENDOSPERM (FIE-HA), the Extra Sex Combs homolog protein present in all 
Arabidopsis PRC2 complexes. We found that the FIE-HA binding sites co-locate with a subset of the H3K27me3 sites 
in the genome and that the associated genes were more likely to be de-repressed in mutants of PRC2 
components. The FIE-HA binding sites are enriched for three sequence motifs including a putative GAGA factor 
binding site that is also found in Drosophila Polycomb Response Elements (PREs). 

Conclusions: Our results suggest that PRC2 binding sites in plant genomes share some sequence features with 
Drosophila PREs. However, unlike Drosophila PREs which are located in promoters and devoid of H3K27me3, 
Arabidopsis FIE binding sites tend to be in gene coding regions and co-localize with H3K27me3. 
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Background 

The Polycomb group (PcG) proteins are found across 
the higher eukaryotes and are essential for normal devel- 
opment. PcG proteins were first identified in Drosophila 
where they are required to maintain repression of home- 
otic genes [1] and have since been shown to be required 
for the correct expression of many genes in plants and 
animals. The polycomb proteins make up two major 
protein complexes; Polycomb Repressive Complex 1 
(PRCl) and PRC2 [1-5] which are conserved in animals 
and plants. PRC2 catalyses trimethylation of histone H3 
lysine 27 (H3K27me3). The H3K27me3 is bound by 
PRCl which ubiquitinates histone H2A [4,6] resulting in 
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a compacted chromatin state that can be inherited 
through mitotic divisions. 

Plants have clear homologs of the four core protein 
components of PRC2, often with multiple genes encoding 
each component [7]. In Arabidopsis, FERTILIZATION 
INDEPENDENT ENDOSPERM (FIE) is the single Ex- 
tra Sex Combs (ESC) homolog, CURLY LEAF (CLF), 
SWINGER (SWN) and MEDEA are Enhancer of Zeste 
homologs, FERTILIZATION INDEPENDENT SEED 2, 
VERNALIZATION 2 and EMBRYONIC FLOWER 2 are 
Suppressor of Zeste 12 homologs and MULTI-SUBUNIT 
SUPPRESSOR OF IRA (MSI) 1-5 are homologs of NU- 
RF55. Of the MSIs, MSIl appears to be a component of 
PRC2 complexes, linking PRC2 to LIKE HETEROCHRO- 
MATIN PROTEIN 1 [8], a protein that has PRCl-like 
function in Arabidopsis, while other MSIs (e.g. MSI4/ 
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FVE) have roles outside the PRC2 complex [7,9]. A loss of 
PRC2 activity in Arabidopsis, such as in clf swn double 
mutants and FIE RNAi plants leads to strong develop- 
mental defects, especially in organ identity [10,11]. Whole 
genome chromatin immunoprecipitation (ChIP) experi- 
ments have shown that about 20% of Arabidopsis genes 
are marked by H3K27me3 [12-17]. The H3K27me3 tar- 
gets have an over-representation of genes that are highly 
regulated as opposed to being constitutively expressed. 
H3K27me3 is generally associated with genes with low 
transcription activity [3,16] consistent with H3K27me3 
having a role in maintaining repression of gene expression. 
Although large numbers of gene loci have H3K27me3 
present, only a minority are de-repressed in vegetative 
tissues of plants mutant for PRC2 components [13]. 
This indicates that in these tissues H3K27me3 is only 
critical for maintaining the repression of a subset of the 
H3K27me3-marked genes and presumably loss of H3 
K27me3 at other loci does not lead to their increased 
expression due to the absence of specifically expressed 
transcription factors. 

A major unanswered question in understanding Polycomb 
repression in plants is how specific loci are targeted by 
the Polycomb complexes. Polycomb recruitment has 
been best characterised in Drosophila where regions of 
H3K27me3 are associated with sequence elements termed 
Polycomb Response Elements (PREs) [18]. Drosophila 
PREs are regions of up to a few hundred base pairs that 
were initially defined as being required to confer Poly- 
comb repression on their target genes. PREs are able to re- 
cruit either PRCl or PRC2 or both. PREs contain binding 
sites for sequence-specific DNA binding proteins. The 
binding sites for Pleiohomeotic (Pho) and the related Pho- 
like are a common element of Drosophila PREs, but they 
also contain sites for other DNA binding proteins includ- 
ing GAGA factor, Pipsqueal< and Zeste. Genome-wide 
studies show that the binding sites for these other factors 
only partially overlap with PRCl and PRC2 target sites 
and the extent to which they contribute to PcG recruit- 
ment is not always clear [4,18]. 

In mammalian systems, PREs are less well characterised 
with the best examples being a 3 kb region from the 
mouse MafB gene and a 1.8 kb region from the human 
HOXD cluster that confer PcG-dependent repression in 
reporter gene systems [19,20]. Both these elements con- 
tain binding sites for YYl (the mammalian PHO homolog) 
suggesting that there is at least some conservation of the 
mechanisms of PcG recruitment between mammals and 
insects. Long non-coding RNAs (IncRNAs) have also been 
implicated in PcG recruitment in mammals. These can act 
in cis, such as the Xist and Kcnqlotl IncRNAs that are in- 
volved in PcG recruitment in X chromosome inactivation 
and imprinting respectively [21,22], or in trans, such as 
the HOTAIR IncRNA which is produced from the HOXC 



cluster and acts as a scaffold to recruit PRC2 to the un- 
linked HOXD locus [23,24]. 

At present less is known of the mechanisms by which 
genes are targeted by the PcG system in plants. There is 
some evidence for the presence of PRE -like sequences in 
plants. A 50 bp element (RLE) has been identified from 
the promoter of the Arabidopsis LEAFY COTYLEDON 
2 {LEC2) gene which is required for PcG repression and 
confers repression and H3K27me3 deposition on a 
transgene [25]. The LEC2 promoter also contains a 
GAGA element that is bound by Arabidopsis GAGA fac- 
tors in vitro. Mutation studies suggest that this GAGA 
element has an activator or enhancer function and is not 
required for H3K27me3 deposition [25]. A second example 
of a plant PRE-like sequence comes from the promoter of 
the BREVIPEDICELLUS (BP) gene. The ASYMMETRIC 
LEAVES 1 (AS1)-AS2 complex binds to defined sequences 
in the BP promoter to silence its expression [26]. The BP 
locus is marked by H3K27me3 which requires the ASl- 
AS2 complex. The AS1-AS2 complex interacts with PRC2 
components and the AS1-AS2 binding site from the BP 
promoter is sufficient to confer Polycomb repression on a 
GUS transgene. These properties are consistent with the 
AS1-AS2 binding site in BP functioning as a PRE at which 
AS1-AS2 recruits PRC2 [27]. 

Some evidence for IncRNAs being involved in Polycomb 
recruitment in plants comes from the FLC gene which en- 
codes a MADS box repressor of flowering [28,29]. FLC ex- 
pression is repressed by vernalisation (extended cold) and 
this repression is maintained in a PRC2-dependent man- 
ner following return to warm growing conditions [11,30]. 
Non-coding sense transcripts (named COLDAIR) pro- 
duced from the large first intron of FLC which are bound 
by the PRC component CLF, are required to maintain FLC 
repression in the cold [31], suggesting that the COLDAIR 
transcript recruits PRC2 to maintain FLC repression. 

To further explore the mechanisms of PcG recruit- 
ment in plants we carried out a ChlP-seq experiment to 
determine the genome-wide distribution of FIE, the sin- 
gle ESC homolog in Arabidopsis which should therefore 
be present in all PRC2 complexes. By comparing the FIE 
binding sites with genome-wide H3K27me3 distribution 
we found over seven hundred high confidence FIE bind- 
ing sites. The FIE binding sites were predominantly 
within gene bodies and were enriched for three sequence 
motifs including putative GAGA factor binding sites. 

Results 

Identification of FIE-HA binding sites by ChlP-seq 

The FIE protein is the only Arabidopsis homolog of the 
ESC protein and is present in all active PRC2 complexes. 
Hence ChIP using the FIE protein as a target should 
identify all sites of PRC2 interaction with the genome. 
We carried out ChlP-seq with a FIE-HA protein com- 
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plementing the fie-11 mutant in the C24 ecotype [11] 
using aerial tissue of 12 day old seedlings. As the PRC2 
complex is thought to interact with either histones or 
DNA binding proteins and not directly with genomic 
DNA, we cross-linked the plant material prior to immu- 
noprecipitation to preserve the interaction of FIE-HA 
with its chromatin binding sites. In a parallel experi- 
ment, ChlP-seq was also carried out for H3K27me3 
using non cross-linked chromatin from wildtype C24 
seedlings. This native ChIP method is generally consid- 
ered superior for assessing histone modifications [32] 
while cross-linked ChIP is preferred for studying chro- 
matin binding proteins. The ChlP-seq datasets were 
mapped to the Arabidopsis genome and regions that 
were enriched in the immunoprecipitations compared to 
input samples were identified as described in Methods. 
Figure 1 shows example data for the LEC2 and BP genes. 
This identified 1298 peaks in the FIE-HA data and 5148 
peaks in the H3K27me3 data using a q score of 10 ^^°. 
This q score was selected from a range that was tested on 
the basis that it identified the largest number of peaks 
from the ChIP samples with minimal false positive peaks 
from the input datasets (Additional file 1: Table SI; 
Additional file 2: Dataset SI, Additional file 3: Dataset S2). 

H3K27me3 abundance and distribution is conserved 
between C24 and Columbia ecotypes 

The number of H3K27me3 peaks we identified was simi- 
lar to that seen in previous analyses [16]. To determine 
whether these represent the same target sequences we 
compared the H3K27me3 distribution in our C24 
dataset to a Columbia (Col) Chip-chip dataset [16]. The 
peaks identified in our C24 H3K27me3 dataset were 
assigned to 3976 genes, 77% of which were also present 



in the list of H3K27me3 genes from Col (Figure 2a). To 
verify this result and quantitatively compare the H3K27me3 
abundance between the two ecotypes, 10 genes were 
randomly selected from each of three groups; present in 
C24+Col, C24-only and Col-only. We determined the 
H3K27me3 abundance at the centre of the peaks for these 
30 genes, as well as for 10 regions with no H3K27me3, by 
ChlP-qPCR on Col and C24 seedlings grown under the 
same conditions (Figure 2b). We found that all 10 C24+Col 
genes had similar H3K27me3 abundance in both eco- 
types. We found that 18 out of 20 genes selected as 
having either C24- or Col-specific H3K27me3 actually 
had similar H3K27me3 abundance suggesting that 
there are false negatives in both datasets. Therefore there 
are very few differences in H3K27me3 location and 
abundance between ecotypes, in agreement with previous 
observations for Col versus Ler [33] and Col versus C24 
orCvi [34]. 

FIE-hHA peaks are associated with H3K27me3 

The genomic locations of the peaks from our FIE-HA 
and H3K27me3 ChlP-seq datasets were compared and 
grouped into three categories; FIE-HA+H3K27me3, FIE- 
HA only and H3K27me3 only (Figure 3a; Additional file 
4: Dataset S3). We found that 723 of the FIE-HA peaks 
overlapped with H3K27me3 peaks (an overlap of Ibp or 
more). To verify the ChlP-seq results we then selected 
two peaks at random from each of five confidence value 
bins for each category of ChlP-seq peak for further ana- 
lysis by ChlP-qPCR (detailed in Methods). In addition to 
the 10 peaks for each of the three groups, we included 
the 10 regions with no H3K27me3 (amplicons 61-70), 
which were also not FIE-HA targets. The amplicons 
were placed in the centre of the overlap of the FIE- 
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Figure 1 FIE-HA and H3K27me3 distribution at the LEC2 and BP loci. Broswer images showing the FIE-HA and H3K27me3 distribution across 
the LEC2 and BP loci. Black arrows are locations of GAGA motifs identified in LEC2 [25], red arrows are AS1-AS2 binding sites in the BP promoter [27]. 
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Figure 2 H3K27me3 distribution is conserved between C24 and Col. (a) Numbers of genes corresponding to an H3K27me3 ChlP-seq peak 
in C24 (tliis study) and Col [16] sliowing the overlap between the two datasets. (b) H3K27me3 ChiP-qPCR enrichment normalised to input DNA 
for amplicons from regions identified as enriched in C24 and Col (1-10), C24 only (1 1-20), Col only (21-30) and neither (51-70). An amplicon 
from FLC (Amplicon 5a) [35] is included for comparison. 



HA+H3K27me3 peaks or the centre of the FIE-HA or 
H3K27me3 only peaks. For ChlP-qPCR, new sample sets 
immunoprecipitated for FIE-HA and H3K27me3 were 
prepared. As a positive control we carried out qPCR 
with FLC amplicon 5a located in the first exon of FLC 
which has detectable FIE-HA binding and low but de- 
tectable H3K27me3 [35]. We found enrichment of 
H3K27me3 at all 20 H3K27me3 sites tested, confirming 
that the number of false positives in the H3K27me3 
dataset is low (Figure 3e). Of the 10 FIE-HA only re- 
gions, five showed a higher enrichment of H3K27me3 
above background than the FLC control primers, again 
showing that there are false negatives for H3K27me3 
(Figure 3e). The regions selected as having neither 
H3K27me3 nor FIE-HA present had very low levels of 
H3K27me3. The ChlP-qPCR for FIE-HA verified 8 of 10 
FIE-HA+H3K27me3 peaks (which matches the false dis- 
covery rate predicted in Additional file 1: Table SI B), 3 of 
9 FIE-HA only peaks and showed no enrichment for the 20 
regions where FIE-HA was not detected in the ChlP-seq 
analysis (Figure 3d). This indicates that the FIE-HA+ 
H3K27me3 peaks are the most reliable of the FIE binding 
sites we identified, while the majority of the peaks identified 
as FIE-HA only are lUcely to be false positives. Therefore we 
focused on the FIE-HA-H3K27me3 peal<s in subsequent 
analyses. We noted that the abundance of H3K27me3 at 
the H3K27me3-only sites was on average significantiy lower 
than that for the FIE-HA+H3K27me3 sites (Figure 3e). 

FIE-HA is enriched across gene body regions 

The distribution of FIE-HA and H3K27me3 was deter- 
mined by plotting the ChIP enrichment of H3K27me3 



and FIE-HA in a gene-centric manner. We observed the 
characteristic enrichment of H3K27me3 across gene 
bodies (Figure 3b) [16]. FIE-HA is also enriched across 
gene bodies (Figure 3b), but with greater enrichment 
at the 5' ends in comparison to the distribution of 
H3K27me3. The size distributions of the FIE-HA+H3K 
27me3 peaks (FIE-HA peaks that overlap with H3K27me3 
peaks) and all H3K27me3 peaks were compared (Figure 3c) 
and found to show a similar distribution at size ranges up 
to about 1.5 kb. 

FIE-HA genes are enriched for developmental functions 

The genes associated with FIE-HA+H3K27me3 and 
H3K27me3 only peaks were used to carry out a gene 
ontology (GO) analysis on the two datasets (Figure 4), to 
determine whether there were any differences between 
the classes of genes with high-confidence FIE binding 
peaks (FIE-HA+H3K27me3) and the bulk of the H3 
K27me3 targets in the genome. The gene classifications 
that were enriched in the two datasets were similar over- 
all, however, the GO categories for multicellular organism 
development, post-embryonic development, flower de- 
velopment and reproduction were more enriched in the 
FIE-HA+H3K27me3 data. In addition there were a 
number of individual categories that were only present 
in one of the datasets; most of these categories were 
from the H3K27me3 dataset, which has the higher num- 
ber of genes. 

As there were differences in the functional catego- 
risation of the genes in the FIE-HA+H3K27me3 and 
H3K27me3 only gene groups, we determined whether 
these groups of genes differed in the extent to which 
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their expression was under PcG regulation. We analysed 
gene expression in seedlings of the clf-7 swn-28 double 
mutant [13] and a pool of intermediate phenotype Tl 
plants carrying an RNAi construct against FIE [11] 
(siFIE; Additional file 5: Figure SI) using Nimblegen ar- 
rays (Figure 5; Additional file 6: Dataset S4). The results 
from the array experiment were verified by RT-qPCR of 
six genes in both clf-7 swn-28 and siFIE compared to 
wildtype (Additional file 7: Figure S2). The clf-7 swn-28 



mutant shows a more severe phenotype than the siFIE 
plants and this is reflected in a greater number of gene 
expression changes in the clf-7 swn-28 mutant (Figure 5). 
The primary effect expected from loss of PcG function is 
up-regulation of genes that are repressed by the pres- 
ence of H3K27me3. However, it is clear that there are 
many secondary changes in gene expression, which can 
be seen in the large number of down- and up-regulated 
genes that are not associated with H3K27me3 in the clf- 
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(See figure on previous page.) 

Figure 4 Gene Ontology analysis of genes in the FIE-HA+H3K27me3 and H3K27me3 only datasets. GO analysis was carried out using 
BinGO [45]. Enriched categories for biological process, molecular function and cellular component are shown. GOslim categories with significant 
enrichment in the dataset were highlighted in colour with different colours representing different levels of significance. The orange categories are 
most significantly overrepresented. White nodes are not significantly overrepresented. The area of a node is proportional to the number of genes 
in the corresponding GO category. 



7 swn-28 and siFIE expression data. The genes that are 
marked by H3K27me3 in wildtype plants and are up- 
regulated in df-7 swn-28 and/or siFIE include those 
where the presence of H3K27me3 is important in rep- 
ressing gene expression; there were 610 such genes in 
the df-7 swn-28 dataset and 124 in the siFIE data with 
86 genes common to both datasets (Figure 5b). We 
compared the frequency at which genes associated with 
FIE-HA+H3K27me3 or H3K27me3 were up-regulated; 
we found that there was about a two-fold higher propor- 
tion of the FIE-HA+H3K27me3 genes up-regulated. For 
example 28% of the FIE-HA+H3K27me3 genes were up- 
regulated in df-7 swn-28 compared to 12% of the 
H3K27me3 alone genes. Hence genes where FIE is 
present are more likely to be those that are primarily 
maintained in a repressed state by H3K27me3 and for 
which activators of expression are present in the seed- 
ling tissue used in these experiments. This was also 
reflected in the GO profiles of the genes up-regulated in 
df-7 swn-28 and siFIE which were similar to that of the 
FIE-HA+H3K27me3 genes (Figure 4; Additional file 8: 
Figures S3, Additional file 9: Figures S4, Additional file 10: 
Figures S5). 

High confidence FIE binding peaks contain putative 
GAGA Factor binding sites 

If Arabidopsis PRC2 complexes are recruited through 
PRE-like sequences some conserved motifs would be 
expected to be associated with the FIE binding peaks iden- 
tified by ChlP-seq. As the FIE-HA+H3K27me3 peaks were 
the highest confidence FIE binding sites, we used these 
peak sequences in a MEME analysis. We searched for 
short motifs (~8 bp) on the basis that the known inter- 
action sites of PRE-binding proteins in Drosophila are 
of approximately this length. This identified four motifs 
in the FIE-HA+H3K27me3 peaks (Figure 6a). We also 
carried out the same analysis on the H3K27me3 genes 
as well as the lower confidence FIE-HA only peaks and 
a control set of random promoter sequences (Additional 
file 11: Figure S6). Motif 1 was present in all four analyses 
and so was not considered further. There were no add- 
itional motifs found in the H3K27me3 only data, in the 
FIE-HA only data there were two additional motifs which 
were similar to motif 3 in the FIE-HA+H3K27me3 peaks. 
Comparison of the motifs in the FIE-HA+H3K27me3 
peaks to motif databases using TOMTOM identified motif 
2 as similar to the TBFl binding site (telobox factor 1), 



motif 3 as similar to a zinc finger protein binding site and 
motif 4 as similar to the GAGA Factor 1/Trithorax-like 
binding site (Additional file 12: Figure S7). We plotted the 
location of the putative TBFl and GAGA Factor binding 
sites relative to gene bodies and found they mirrored the 
location of FIE-HA+H3K27me3 peaks, being enriched 
across transcription units (Figure 6b). 

Discussion 

We have carried out a ChlP-seq analysis to identify 
binding sites for the PRC2 component FIE across the 
Arabidopsis genome. In comparison to the numbers of 
H3K27me3 sites (5148) we identified fewer high confi- 
dence sites of FIE binding (723). As PRC2, and therefore 
FIE, is required to deposit and maintain H3K27me3, a 
similar numbers of peaks is expected in both expe- 
riments. The discrepancy could have some biological 
significance or be a technical artefact. A technical differ- 
ence could be simply a consequence of the two ChIP 
experiments using different antibodies and targeting 
proteins that interact with chromatin in different ways. 
Histone H3 is an intrinsic part of the nucleosome 
structure while FIE is part of a protein complex that 
interacts with chromatin. Although the FIE-HA sample 
was cross-linked prior to immunoprecipitation, the in- 
direct nature of the interaction between FIE and the 
DNA that is assayed by ChIP may make it harder to de- 
tect FIE binding regions compared to H3K27me3 
regions. 

Biological explanations for the low number of FIE-HA 
peaks compared to H3K27me3 peaks could be that there 
are differences in the number of PRC2 binding sites, the 
strength of PRC2 binding or the amount of time that 
PRC2 is present at a given locus. There is support for 
this last possibility from FRAP (Fluorescence Recovery 
After Photobleaching) studies in Drosophila [36] which 
suggest that polycomb complexes are not constantly 
bound to chromatin and that the rate of assembly of 
polycomb complexes differs between loci. Genome-wide 
comparison of H3K27me3 and PRC2 in Drosophila also 
identified 'weak' PcG sites [37] where H3K27me3 but 
not PRC2 was detected. The average abundance of 
H3K27me3 at the FIE-HA+H3K27me3 peaks was sig- 
nificantly higher than at the H3K27me3 only regions 
(Figure 3e). The genes associated with FIE-HA+ 
H3K27me3 regions were also more likely to be up- 
regulated in plants that have reduced PcG function. We 



Deng et al. BMC Genomics 2013, 14:593 
http://www.biomedcentral.com/1471 -21 64/1 4/593 



Page 8 of 12 



(a) 



2024 ( 


151) 


254 




2625 ( 


73y 


105 



clfswn siFIE 



(b) 



\Genes associatec 
^vwithChlPseq 
peaks 
Up-regulatea\. 
genes \^ 


FIE + 
H3K27me3 


H3K27me3 
only 


H3K27me3 
total 


754 


3222 


3976 


clf swn 


2024 


210(27.9%) 


400(12.4%) 


610(15.3%) 


siFIE 


254 


35(4.6%) 


89(2.8%) 


124(3.1%) 


siFIE+ 
clfswn 


151 


28(3.7%) 


58(1.8%) 


86(2.2%) 



Figure 5 FIE-HA+H3K27me3 genes are enriched for genes 
under PcG repression in vegetative tissues, (a) Numbers of 
genes up- and down-regulated in clf-7 swn-28 and siFIE seedlings 
compared to wildtype. Overlap of Venn diagram indicates the 
numbers of genes common to both datasets. (b) Numbers of genes 
up-regulated in clf-7 swn-28, siFIE or both that are also associated 
with a FIE-HA4-H3K27me3 or H3K27me3 only ChlP-seq peak. 
Percentages are the percentage of genes associated with a ChlP-seq 
peak that are also in the up-regulated gene lists. 



speculate that these genes are ones for which activators 
are present in vegetative tissues (with the activators 
regulating other genes) and hence there is a selection for 
increased PRC2 occupancy to maintain high levels of 
H3K27me3 and repression of gene expression. 

A search for sequence motifs in the high confidence 
FIE binding sites identified four short conserved motifs. 
One of these was identified as being similar to the 
GAGA factor binding site which is a component of 
Drosophila PREs. The GAGA factor binds to many 
Drosophila PREs, but is also found in active promoters 
[38] and is suggested to have roles in nucleosome deple- 
tion and PcG recruitment. In plants the GAGA motif is 
often found within core promoter sequences [39]; how- 
ever the GAGA motifs identified through FIE-HA ChlP- 
seq are predominantly located in gene bodies. They are 
not found in analyses of random sequences, indicating 
that there is a positive association with PRC2. The 
H3K27me3 only sites did not contain the GAGA motif 
and had lower levels of H3K27me3 than the FIE-HA-t- 
H3K27me3 peaks. The FIE-HA-t-H3K27me3 associated 
genes are also more likely to be up-regulated in plants 
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Figure 6 MEiVlE analysis of sequences in FIE-HA+H3K27me3 
peaks identifies enriched sequence motifs, (a) Motifs enriched in 
the FIE+H3K27me3 peaks determined by MEME analysis, (b) 
Distribution of motifs 2 and 4 with respect to gene features. 



which have a loss of PRC2 function. Based on this we 
speculate that the GAGA motif has a role in strengthen- 
ing Polycomb recruitment to target genes for which 
Polycomb regulation is the primary mode of repression. 

One element that functions in a PRE-like manner in 
plants is the RLE element in the Arabidopsis LEC2 



Deng et al. BMC Genomics 2013, 14:593 
http://www.biomedcentral.eom/1 471 -21 64/1 4/593 



Page 9 of 12 



promoter [25]. RLE is located near a GAGA element; 
however this GAGA element is not required for the 
function of RLE. RLE is at one edge of a region of FIE- 
HA binding (Figure la), which also includes the GAGA 
element. The PRE-like sequence identified in the BP 
promoter does not have any associated FIE-HA binding 
in our high-confidence dataset, although there is evi- 
dence of FIE-HA binding across the BP gene body 
(Figure lb). The BP PRE-like region has been shown to 
bind CLF-GFP expressed from a strong 35S promoter 
[27], so it may be a site of weaker PRC2 interaction. The 
GAGA motif has also been identified in association with 
LEY binding sites [40]; while some of these genes are 
also H3K27me3 targets, many are not. This suggests that 
in plants, as for Drosophila [38], the role of GAGA 
factor is wider than PcG function. 

Although our data suggest that the GAGA factor may 
be a common component in the PcG regulation mech- 
anism in plants as well as in flies, there are differences 
in the structures of the regions occupied by PRC2 and 
H3K27me3. We did not find evidence for relatively nar- 
row regions of PRC2 binding with low H3K27me3 and 
depleted of nucleosomes, flanked by wide regions of 
H3K27me3, as seen at many Drosophila PcG targets. 
The observed co-localisation of FIE-HA and H3K27me3 
is more reminiscent of the data in mammalian systems 
[41]. The association of H3K27me3 and FIE-HA binding 
with gene body regions appears to be particularly strong 
in plants compared to both mammals and insects. 

Conclusions 

We have used a genome-wide ChlP-seq approach to 
identify FIE and hence PRC2 binding sites across the 
Arabidopsis genome. Based on our high-confidence 
dataset we find that the regions of PRC2 binding are 
largely within gene body regions and co-localise with 
H3K27me3. The emerging reports of plant PREs and 
our finding of GAGA motifs at FIE binding sites suggest 
that DNA binding proteins have a role in recruiting 
PRC2 in plants and that further dissection of potential 
PRE-like regions could help our understanding of how 
the PcG system is recruited to specific genes in plants. 

Methods 

Plant material 

All plants were grown on MS agar plates in a 16 h 
light: 8 h dark photoperiod under fluorescent lights at 
22°C for 12 days. Whole seedlings were harvested for 
Chip experiments or RNA extraction. Tl siFIE plants 
(in Col ecotype) were selected on plates supplemented 
with kanamycin (50 mgL'^). The swn-7 clf-28 mutant 
is sterile and was selected from the progeny of swn-7 
clf-28 /+ plants. 



ChlP-seq and bioinformatic analysis 

Native chromatin immunoprecipitation (N-ChIP) was per- 
formed as described previously [42] with minor modifica- 
tions. In brief, Arabidopsis seedlings were collected and 
ground in liquid nitrogen. Nuclei were extracted with 
buffers 1, 2 and 3 and chromatin was digested by MNase 
for 6 minutes to generate native chromatin templates 
consisting primarily of mononucleosomes. Native chro- 
matin templates were incubated with anti-H3K27me3 
antibody (07-449, Millipore) and antibody-bound DNA 
fragments were extracted. ChIP DNA fragments were 
sequenced by Illumina (San Diego, CA) with an Illumina 
Genome Analyzer (GAII) by standard procedures. A 
control sample of input DNA from the micrococcal nu- 
clease digested lysates before immunoprecipitation was 
also sequenced. 

Material for FIE-HA ChIP was cross-linked with for- 
maldehyde and Chip carried out as previously described 
[35,43] on mononucleosome sized micrococcal nuclease- 
digested lysates. Over 20 pull-downs were performed on 
sets of 1 g tissue for the FIE-HA line and the untrans- 
formed C24. ChlPs were selected for high enrichment 
relative to the C24 control by qPCR using a set of 6 
diagnostic amplicons (Additional file 13: Table S2). DNA 
was pooled from the 12 immunoprecipitates that had 
highest enrichment and used for ChlP-seq as above ex- 
cept the sequencing was carried out by the Australian 
Genome Research Facility (Melbourne, Australia). As a 
control DNA extracted from the digested lysates before 
immunoprecipitation was also sequenced (input DNA). 

The numbers of sequence reads obtained for each 
sample are detailed in Additional file 1: Table SI. The 
sequencing reads were mapped to the Arabidopsis gen- 
ome (TAIR9 build) using BioKanga (http://biokanga. 
sourceforge.net/), allowing 2 mismatches at any position. 
Peaks were identified using the log2 ratio of signal dens- 
ity between two samples to determine enriched regions 
as candidates, followed by significance analysis on the 
read density from candidate peaks [44]. Briefly, each 
read was extended L bp (L=150 bp for H3K27me3 and 
L=200 bp for FIE samples respectively) from the begin- 
ning of the 5' end to represent the fragment length. 
was the normalised number of the extended reads lo- 
cated within a 10 bp window along a chromosome for 
sample x; a log2 ratio was then calculated on the corre- 
sponding Sx as logaR = log2(Streatment/Scontroi)- Adjacent 
windows that have a log2R above a threshold (the 
threshold was 3-fold enrichment when compared with 
input) were merged to form candidate peaks and peaks 
that have been separated less than 200 bp were further 
merged. Finally a significance test was performed using 
the PeakSeq algorithm [44] where a p value was 
obtained from binomial test on the number of reads 
within a candidate peak and a multiple test correction 
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was followed to give a q value for estimation of false 
discovery rate. A q score of 10'^" was selected on the 
basis of maximizing the number of predicted peaks in 
the immunoprecipitated sample and minimizing the 
numbers of peaks identified in the control input DNA 
samples. 

GO analysis 

GO enrichment analysis was performed using the 
BINGO 2.44 plug-in [45] in Cytoscape 2.8.3 [46] with 
the GOslim plants dataset. To test for enrichment, a 
hypergeometric test was conducted and the Benjamini 
and Hochberg false discovery rate was calculated. The 
network of the enriched categories was presented. 

Motif analysis 

MEME software (version 4.9.0) [47] was applied to yield 
over-represented motifs in the dataset. The width of the 
motif was set as 6 to 8 nucleotides. Zero or one per se- 
quence was used for the distribution of a single motif 
among the sequences. 

RNA extraction and qRT-PCR 

RNA was extracted from approximately 100 mg of seed- 
lings using the RNeasy Plant Mini Kit (Qiagen) according 
to the manufacturer's instructions. For quantitative RT- 
PCR, DNase-treated RNA was reverse transcribed using 
an oligo dT primer and Superscript III reverse transcript- 
ase (Invitrogen, http://www.Lnvitrogen.com/) and at least 
triplicate reactions were amplified using 7900HT Fast 
Real-Time PGR System (Applied Biosystems, http://www. 
appliedbiosystems.com/) with SYBR green. The primers 
used are listed in Additional file 13: Table S2. For verifica- 
tion of GhIP peaks qPCR was carried out using a set of 
genomic DNA standards that allows the comparison of 
values between amplicons [35]. 

Expression array analysis 

RNA was extracted from intermediate phenotype siFIE 
plants (Additional file 5: Figure SI), Col, df-7 swn-28 
and Col using Qiagen Plant RNeasy mini kit. For each 
sample, three pools of 10-12 plants were used. The siFIE 
plants were analysed for FIE mRNA by RT-qPCR; the 
maximum level of FIE mRNA was found to be 10% of 
wildtype Col. 

RNA samples were hybridised to a Roche NimbleGen 
Arabidopsis Gene Expression 4x72K Array (catalogue 
number A4511001-00-01) representing 30,361 genes, 
each with 2 target probes as annotated by TAIR version 
6. DNAstar software was used for analysis; gene lists 
with higher than 2 fold de-regulation at 95% confidence 
compared to wild type were exported to Excel files for 
comparison. 



Availability of supporting data 

The Nimblegen array date in this publication have been 
deposited in NCBI's Gene Expression Omnibus [48] and 
are accessible through GEO Series accession number 
GSE48857 (http:/ /www.ncbi.nlm.nih.gov/geo/query/acc. 
cgi?acc=GSE48857). The raw sequence data has been de- 
posited into the NCBI Short Read Archive, accession 
number SRP027413. 

Additional files 



Additional file 1: Table SI. Summarises the identification of peaks 
from the FIE-HA and H3K27me3 datasets. 

Additional file 2: Dataset SI. Lists all H3K27me3 ChlP-seq peaks. 

Additional file 3: Dataset S2. Lists all FIE-HA ChlP-seq peaks. 

Additional file 4: Dataset S3. Lists FIE-HA peaks coincident with 
H3K27me3 peaks. 

Additional file 5: Figure SI. Shows phenotypes of weak, intermediate 
and strong Tl siFIE plants in C24. 

Additional file 6: Dataset S4. Lists of up- and down-regulated genes 
from microarray analysis of siFIE and clf-7 swn-28 plants. 

Additional file 7: Figure S2. Shows RT-qPCR verification of microarray 
data. 

Additional file 8: Figure S3. Shows GO analysis of genes up-regulated 
in SiFIE or df swn and siFIE. 

Additional file 9: Figure S4. Shows GO analysis of genes up-regulated 
in SiFIE or clf swn and siFIE that have FIE + H3K27me3 ChlP-seq peaks. 

Additional file 10: Figure S5. Shows GO analysis of genes up-regulated 
in SiFIE, clf swn or both that have H3K27me3 only ChlP-seq peaks. 

Additional file 11: Figure S6. Shows additional MEME analyses to 
those shown in Figure 6. 

Additional file 12: Figure S7. Shows TOMTOM analyses of motifs 2 to 4. 

Additional file 13: Table S2. Lists oligonucleotide sequences used in 
this study. 
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